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Vaccine development against hepatitis C virus (HCV) is hindered by poor understanding of factors defining 
cross-immunoreactivity among heterogeneous epitopes. Using synthetic peptides and mouse immunization 
as a model, we conducted a quantitative analysis of cross-immunoreactivity among variants of the HCV 
hypervariable region 1 (HVRl). Analysis of 26,883 immunological reactions among pairs of peptides 
showed that the distribution of cross-immunoreactivity among HVRl variants was skewed, with antibodies 
against a few variants reacting with all tested peptides. The HVRl cross-immunoreactivity was accurately 
modeled based on amino acid sequence alone. The tested peptides were mapped in the HVRl sequence 
space, which was visualized as a network of 11,319 sequences. The HVRl variants with a greater network 
centrality showed a broader cross-immunoreactivity. The entire sequence space is explored by each HCV 
genotype and subtype. These findings indicate that HVRl antigenic diversity is extensively convergent and 
effectively limited, suggesting significant implications for vaccine development. 

Hepatitis C virus (HCV) is a single-stranded RNA virus belonging to the Flaviviridae family'. HCV infects 
3.0% of the world's population and is a major cause of liver disease worldwide^. HCV infection progresses 
to chronicity in 70%-85% of infected adults'. An estimated 476,000 deaths per year are attributed to 
hepatitis C^. However, there is no vaccine against HCV and current anti-viral therapy is effective in 50%-60% of 
patients'". HCV is genetically very heterogeneous and classified into 6 genotypes and numerous subgenotypes^. 

Vaccines are among the most efficacious means to control infectious diseases. However, the development of 
vaccines against highly heterogeneous viruses such as HCV and human immunodeficiency virus (HIV) is 
considerably hampered by variant- specific neutralizing immune responses. These viruses have seemingly unlim- 
ited capacity to rapidly mutate and escape from immune neutralization, thus presenting a major obstacle for 
formulating broadly protective vaccines'"''. Considering that an estimated 130 million and 33 mUlion individuals 
are infected worldwide with HCV and HIV, respectively^'^, and that each infected host harbours a large variety of 
viral variants, the number of viral variants circulating in the world is immense. Developing vaccines against such 
broad range of viral variants seems a daunting task. 

Classical approaches to vaccine development are yet to produce broadly protective vaccines against HCV and 
HIV*'". Novel vaccine strategies recently developed to cope with viral antigenic diversity focus either on using 
epitopes with limited heterogeneity", generating a concoction of heterogeneous epitopes'"'" or mimotopes'^''^, or 
predicting consensus sequences, center of tree variants or phylogenetic ancestors'* '''. These strategies are based 
on specific assumptions regarding properties of highly heterogeneous epitopes, viz., that immunological specifi- 
city is strongly linked to the epitope primary structure, with cross-immunoreactivity (CR) declining with increas- 
ing genetic difference between epitopes, and that the viral sequence space is shaped by diversifying evolution 
resulting from an "arms race""'. However, the conditional relevance of these assumptions has not been system- 
atically corroborated. 

The most important HCV neutralizing epitope has been mapped in the hypervariable region 1 (HVRl), located 
at amino acid (aa) positions 384-410 in the structural protein E2. HVRl sequence variation correlates with 
neutralization escape and is associated with viral persistence during chronic infection""^^. Although some 
neutralizing epitopes have been discovered in conserved regions of HCV structural proteins^', the variant- 
specificity of humoral protective responses^"*'^^ points to the essential role played by the variable epitopes in 
controlling HCV infections. 

In the present work, a quantitative analysis of the HVRl CR, modeled using synthetic peptides and mouse 
immunization, in conjunction with a network analysis of the HVRl sequence space showed significant immuno- 
logical and structural HVRl convergence. The fmdings suggest tractabUity of the HVRlimmunological specificity, 
and offer a novel framework for HCV vaccine development, which is applicable to other heterogeneous viruses. 
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Results 

Convergence of HVRl CR. CR analysis among HVRl variants using 
human serum specimens is complicated by the multi-specificity of 
the humoral response against numerous HCV variants in a given 
infected host. To overcome this problem, many groups have used 
mice to study the immune response to the HVRl epitope^*""^". Several 
studies have shown the specific reactivity of HVRl peptides with sera 
of infected individuals^''^^'^' We modeled HVRl CR using serum 
specimens from mice immunized with 103 synthetic peptides 
representing different HVRl variants (referred to as immunogens). 
A set of 261 HVRl peptides was used as antigens in an enzyme 
immunoassay, of which 102 sequences were also used as immuno- 
gens. AH HVRl variants were randomly selected from major 
branches of a phylogenetic tree constructed using sequences available 
in GenBank. A total of 165,120 immunoassay reactions were evaluated. 
Among 26,883 unique immunogen-antigen pairs, 5039 (18.7%) were 
cross-immunoreactive (Fig. SI and Table SI). 

Many HVRl variants derived from different genotypes were 
found to be cross-immunoreactive (Fig. S2). The CR frequency 
was 16.2% and 20.6% between inter- and intra- genotypic pairs, 
respectively. This finding points to the independent occurrence 
of the identical immunological specificity among phylogenetically 
unrelated variants, indicating the existence of immunological homo- 
plasy. There was an inverse relationship between the Hamming dis- 
tance and CR frequency among pairs of HVRl variants (Fig. lA). 
Thus, the increase in genetic distance between viral variants was 
associated with a decline in the capacity of antibody raised against 
one variant to recognize another variant. However, this distance plot 
(Fig. lA) shows two intriguing findings. First, 12.4% of variants 
differing from each other at 21-25 positions were cross-immunor- 
eactive. Such an extreme difference in aa composition is another 
strong indication of the independent origin of identical immuno- 
logical specificities, which suggests convergence of the HVRl 
immunological properties. Second, 18.4% of all tested variants did 
not show self-immunoreactivity. Although most of these peptides 
showed CR with others (ranging from 0% to 34.9%), the group of 
immunogens that did not self-react showed a lower average CR level 
than the group of immunogens that self-reacted (MRPP test^**; p = 
0.0001). 

The last observation suggests separation between immunogenic 
and antigenic HVRl specificities under the experimental conditions. 
This observation was further extended by analysis of the range of 
antibody and peptide CR (Fig. IB). The frequency distributions for 
immunogens (median 8.8%) and antigens (median 18.4%) were dis- 
tinctly different (Kolmogorov-Smirnov Test, p < 0.0001); 53.4% of 
the immunogens reacted with <10% of antigens, and only 2.9% of 
immunogens showed broad immunoreactivity with >90% of pep- 
tides, the maximum being 97.7% for HVRl variant P240 (genotype 
la). These observations show a highly non-uniform CR distribution 
among HVRl immunogens. In contrast, the frequency distribution 
of antigen CR was centered at the average value. None of the antigens 
reacted with >52.5% of the immunogens (Fig. IB). This difference 
between the antigen and immunogen distributions was also reflected 
in the identification of 2 minimal sets of peptides. One set (n=3, 
peptides P032, P240 and P247) collectively elicited antibodies immu- 
noreacting with all 261 antigens. The other set (n = 14) immunor- 
eacted with antibodies against 101 out of 103 immunogens. The 
observed separation between antigenic and immunogenic properties 
may be explained by differences in conformational states of the con- 
jugated peptides used for immunization and free peptides used for 
antibody detection, indicating conformation dependence of the 
HVRl antigenic epitopes. 

To further evaluate HVRl CR, we constructed a network where 
each node is an HVRl sequence and there is a link between two nodes 
if the reaction between the two corresponding peptides was positive 
(Fig. IC). All HVRl sequences were connected in a single giant 
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Figure 1 | A) Relationship between CR and genetic distance. The 
Hamming distance is the number of different aa positions. B) CR 
distribution. Percentage of peptides found in each CR bin. The numbers 
below bins show the upper limit of the CR values. C) CR network. Each 
node is an HVRl sequence and there is a link between two nodes if the 
reaction between the two peptides was positive. 

component. Topological analysis showed that this network is a 
"small world"'^ and has a small average shortest path between every 
pair of reachable vertices (2.04 steps) and a small diameter (5 steps), 
the maximum shortest path between two vertices. The network also 
lacks identifiable modules or communities, suggesting an overall 
homogeneity of the HVRl immune recognition. These findings 
emphasize the pervasive incidence of broad CR among HVRl var- 
iants. To reduce bias that may be introduced owing to the asymmetry 
of this network composed of a different set of immunogens and 
antigens, we extracted the symmetric subnetwork consisting only 
of the 102 peptides which were both immunogens and antigens. 
For this subnetwork, the average CR was 20%, and average shortest 
path distance and diameter were 2.01 and 4, respectively, thus being 
similar to the total network. 
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It is important to note that there is no correlation between the CR 
of a peptide as immunogen and as antigen (Fig. S3). Moreover, only 
1 1% of all links in the subnetwork were bidirectional, indicating that 
the majority of peptides used in this study had different immuno- 
genic and antigenic specificity. Nonetheless, it was still possible to 
follow a path from a given variant to almost any other in a few steps, 
with 89.3% of all possible pairs being reachable. We also calculated 
the clustering coefficient'"', defined as the ratio between the number 
of links of the neighbors of a node and the number of links they could 
have if they were a fully connected subgraph or clique, was calculated 
for each node. The subnetwork had a high local structure with an 
average clustering coefficient of 0.6658, such a high value being 
related to the large number of cliques (n=2,979) in the network. 
Each clique contained 3-17 nodes (average of 12.72). A clique in 
the network can be viewed as a set of variants sharing a single 
immunological specificity. 

Convergence of HVRl sequence space. To study the structure of 
HVRl sequence space, we compiled 11,319 HVRl sequences from 
3,172 patients. The full dataset contained sequences of two classes: (i) 
multiple clonal variants obtained from a single time-point from 176 
patients, and from 2-14 time-points from 29 patients; (ii) a single 
consensus sequence from 2,967 independent patients. Of all 
sequences, 65.4% were obtained from the Los Alamos HCV 
Sequence Database'^, and 34.6% were clonal variants generated in 
our laboratory using end-point limiting- dilution PGR"*. The final 
dataset included 4757 unique aa sequences: 3432 sequences of 
genotype 1, 727 of genotype 2, 235 of genotype 3, 103 of genotype 
4, 75 of genotype 5, 43 of genotype 6, and 142 of unknown genotypes. 
When considering only epidemiologically unrelated sequences, the 
percentage of sequence variation due to differences among genotypes 
is 4.7 times higher at the nucleotide level (average 20.76%) than at the 
aa level (4.39%). Even though these average differences among 
genotypes are statistically significant, the maximum likelihood 
trees of nucleotide or aa sequences do not show clear genotype- 
specific clusters (Fig. S4 and S5). The lack of genotype-specific 
clustering is characteristic only for HVRl, since inclusion of the 
flanking regions from El and/or E2 regions completely restores 
phylogenetic relationships among HCV sequences. 

Taking into consideration a potential convergence among HVRl aa 
sequences and the dimensionality problem (see Methods for details), 
we have studied different ways of portraying this sequence space, 
giving preference to approaches that could accurately model distances 
among closely related sequences. The sequence space was visualized 
using a Pathfinder network (PFNET) based on the entropy-weighted 
Hamming distances between 4,757 unique HVRl aa sequences 
(Fig. 2A). Each unique aa sequence is represented with a node in 
the PFNET and each link connects nodes of the highest similarity. 

The low level of differentiation among genotypes at the aa level is 
also evident in the lack of genotype-specific clusters in the PFNET 
model of sequence space (Fig. 2A), which shows that HVRl 
sequences from different HCV genotypes explore the same space 
at the aa level. Only 3.15% of the variability in path distances among 
epidemiologically unrelated sequences can be attributed to genotype 
differences. A similar observation has been made for the two most 
prevalent subtypes in the PFNET, la and lb. These findings suggest a 
significant HVRl convergence among HCV genotypes, with each 
representing basically the entire HVRl sequence space. Addi- 
tionally, we measured the shortest path between any two sequences 
over the entire PFNET. The diameter and the average shortest path 
were calculated for populations of intra-host HVRl variants from 
(i) single time-points at the acute stage, (ii) single time-points during 
chronic infection, (iii) multiple time-points follow-up, and (iv) var- 
iants of different genotypes. Both measures reflect the breadth of 
distribution of variants in the PFNET. Although there was, in gen- 
eral, a significant difference in both the diameter and average 
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Figure 2 | A) Location of HVRl variants from 6 HCV genotypes in the 
sequence space modeled with PFNET. Each genotype is shown in a 
different color. B) Exploration of the PFNET by different HVRl samples. 
The acute group consists of 34 single time-point samples, the chronic 
group has 90 single time-point samples, the foUow-up group has 29 
clusters and the genotype group consisted of 8 clusters. C) Exploration of 
sequence space by HVRl variants from 5 follow-up patients. Patient 1001 
to 1004 were described in™. Patient 7001 was described in''. 
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distance among all four sets (MRPP, p = 0.0001), an overlap in these 
values was observed at different evolutionary levels (Fig. 2B). All 
pair-wise comparisons of the diameter and average distance between 
sets were significantly different (Table S2), except between the 
chronic and follow-up sets (MRPP, p = 0.7305). 

Although most of the single time-point samples had a small dia- 
meter, some samples contained variants spreading in PFNET far 
from each other and had a diameter greater than follow-up patients. 
Another observation was that variants from some chronic cases were 
found so scattered in the PFNET that they had diameter and average 
distance overlapping with the measures for some HCV genotypes, 
thus indicating that HVRl variants generated during intra-host 
evolution in some patients may explore the HVRl space at a scale 
similar to subgenotypes and genotypes. Fig. 2C shows the range of 
exploration of the PFNET sequence space by variants from five 
patients who were followed-up for 8.8 to 26 years. HVRl variants 
from patient 1001 (genotype la) are represented with 540 sequences 
collected over a period of 16.0 years at 13 time-points; from patient 
1002 (la) with 497 sequences collected over a period of 18.2 years at 
14 time-points; from patient 1003 (lb) with 180 sequences collected 
over a period of 8.8 years at 5 time-points; from patient 1004 (la) 
with 574 sequences collected over a period of 16.0 years at 10 time- 
points; and from patient 7001 (la) with 52 sequences collected over a 
period of 26.0 years at 8 time-points. The variants from each of two 
patients, 1001 and 7001, were found in different regions of the 
PFNET. Collectively, all these observations suggest that the HVRl 
space is small; it is entirely scanned by each genotype, and can be 
widely traversed by intra-host HVRl variants. However, although 
the space is small for the independent accommodation of genotypes 
and subgenotypes, it is sufficiently large to provide a significant 
flexibility for the intra-host HVRl evolution that would facilitate 
HCV escape from the neutralizing immune responses. 

CR Distribution in the HVRl sequence space. All peptides used in 
this study were mapped in the PFNET to show the uniform 
distribution of the tested variants in the modeled sequence space 
(Fig. 3A). The cross-immunoreactive pairs of sequences were 
linked to visualize CR distribution in this network. Although this 
visualization showed that CR occurred among widely scattered 
peptides rather than being confined to some particular regions of 
the network, the high density of links precluded the detection of 
any meaningful structure in this distribution. Further analysis of 
topological parameters of the PFNET showed that the closeness 
centrality, measured as the average shortest path from a given 
node to all other nodes in the network, is significantly associated 
with the CR range of the peptides (Fig. 3B). The average CR of 
immunogens with the highest centrality is 3.37 times greater than 
for immunogens with the lowest centrality (MRPP, p = 0.0099). A 
similar, albeit less distinct, trend was observed for the antigens, with 
an average CR that was 1.64 times greater for the most central than 
for the least central (MRPP, p = 0.0099). This observation suggests 
that the HVRl sequence space as modeled with PFNET is associated 
with CR and, therefore, links HCV evolution with specific 
immunological recognition of HVRl epitopes. 

Predictive model of HVRl CR. The significant functional and 
structural convergence observed in this study indicates that the 
same HVRl properties evolved frequently and independently in the 
occupied sequence space. Such regular occurrence of the limited 
number of immunological specificities renders them tractable and 
highly amenable to predictive modeling. Therefore, we generated a 
classification model in the form of a decision tree associating HVRl 
sequence and CR. This simple model predicted CR based on sequence 
of the immunogen and antigen pairs (Fig. S6). A 10-fold cross- 
validation showed that the average testing accuracy was 92.5% 
(S.D. 0.4%), with the average sensitivity and specificity being 99.7% 
(S.D. 0.2%) and 85.3% (S.D. 0.7%), respectively. Using this model. 
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Figure 3 | A) PFNET map of the experimentally tested HVRl variants and 
their CR. The sequences used as both immunogen and antigen are shown 
in red (n= 103), the sequences used only as antigen are shows in blue 
(n=261) and the sequence used only as immunogen is shown in purple 
(n= 1). The light green lines link cross-immunoreactive variants. B) HVRl 
CR peptides according to their PFNET centrality. AH nodes were divided 
into 5 bins starting from most peripheral (left) to most central (right). The 
numbers below bins show the upper limit of the closeness centrality values. 
The standard error of the mean is shown as black bars. 

we predicted global cross-immunorectivity among all possible pairs 
of sequences in the PFNET (n=2.26E+07). A strong inverse 
relationship between the distance of two peptides and the predicted 
CR frequency was found (Fig. 4A). Likewise, the distributions of 
the predicted and observed cross-immunorectivity are similar both 
for antigens and immunogens (Fig. 4B). Although most of the 
immunogens had a low predicted breadth of immunoreactivity, we 
found 73 sequences (1.53% of all analyzed) with predicted CR of > 
99% in different parts of the PFNET (Fig. S7 and S8). These variants 
may be viewed as highly cross-immunoreactive HVRl mimotopes, 
thereby supporting vaccine development strategies that rely on 
discovery of broadly immunoreactive epitopes'^ 

Since the experimentally tested peptides represent a small fraction 
of all HVRl sequences, they are substantially different from each 
other and cannot be used to evaluate CR among very closely related 
variants that may be expected at a single time-point in individual 
patients. However, the predicted data can be used to assess such local 
CR by calculating for each HVRl variant the CR percentage to its 5 
closest genetic sequences in the fuU HVRl dataset (Fig. 4C). Local CR 
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Figure 4 | A) Relationsliip between predicted CR and hamming distance. 
B) Predicted global CR distribution among 4,757 HVRl variants. The 
average global CR of the immunogenes is 18.1% (S.E. = 0.3) and of 
antigens is 18.1% (S.E. = 0.1). C) Predicted local CR distribution among 
4,757 HVRl variants. The average local CR of the immunogenes is 27.4% 
(S.E. = 0.5) and of antigens is 28.6% (S.E. = 0.5). 

is generally higher than global CR because the closer two peptides 
are, the more likely they will cross-immunoreact. Although both 
local and global CR distributions were similar for immunogens, 
the antigen distributions were strikingly different, with the local 
distribution being considerably skewed (Fig. 4C). However, most 
HVRl variants were predicted to be largely non-immunoreactive 
with each other (52.8% of the immunogens and 39.3% of the antigens 
do not cross-immunoreact locally). This finding is consistent with 
the suggested role of HVRl in HCV escape from immune responses 
during human infection""^'. 

The predicted CR was compared between aforementioned acutely 
infected (34 patients, 130 unique aa sequences) and chronically 
infected patients (90 patients, 799 unique aa sequences). The average 
global CR was significantly lower (MRPP, p = 0.0155) in the acutely 
infected patients (average = 10.60; S.E. = 1.84) than in the chron- 
ically infected patients (average = 17.10; S.E. = 1.46). This finding 
is consistent with previous observations made for HCV-infected 
patients^^'"". The data shown here suggest, however, that the lower 
immunogenic CR in the acute-phase specimens is related to HVRl 
properties and not only to the lower heterogeneity of intra-host 
populations usually found in acutely infected patients. Additionally, 
all these observations indicate that the predictive model captures 



HVRl CR not only under the specific experimental conditions but 
also reflects some HVRl immunological properties existing during 
human HCV infections. Nevertheless, considering the conformation 
dependence of the HVRl CR (Fig. 1) discussed above, it should be 
noted that the observed specificity of immunological reactions will 
vary depending on the experimental conditions. A high accuracy of 
the decision tree model (Fig.S5) indicates that the HVRl structural 
properties are strongly linked to specificity of immunoreactivity, sug- 
gesting the use of such models to cope with HCV heterogeneity for 
vaccine design. However, these models should be generated with 
consideration of the experimental conditions and HVRl presentation 
in different protein contexts. 

Discussion 

Development of vaccines against HCV and other genetically hetero- 
geneous viruses is hindered by lack of understanding of the factors 
that define the relationship between genetic variability and immuno- 
logical specificity of antigenic epitopes. Elucidation of CR among 
variable antigenic epitopes is crucial for vaccine development. In 
the present work, we have modeled the HVRl CR and its distribution 
in sequence space, and formulated a novel concept of antigenic con- 
vergence for highly heterogeneous antigenic epitopes. 

CR among HVRl variants has been readily observed in early HCV 
experiments^"'"""'^'^-'". Using well characterized antibodies, several 
studies confirmed these early observations and showed that the CR is 
most probably associated with the HVRl structural constraints^'"'^'*. 
These findings prompted a search for natural variants" and mimo- 
topes'' with a broad HVRl CR. However, these studies did not 
succeed in establishing a quantitative association of CR to HVRl 
heterogeneity. For example, Jackson et al" conducted an extensive 
examination of CR (maximum = 31.1%; mean=9.53%) among a 
number of HVRl peptides but found no firm connection of 
immunological reactivity to sequence. Detection of CR (max- 
imum = 83.3%; mean=36.7%) among HVRl variants derived from 
different HCV genotypes'"'''" and the highly variable degree of 
immunoreactivity of sera with peptides containing HVRl variants 
from the same patients^' '" clearly illustrates the lack of a simple 
association between sequence similarity and CR. 

It must be noted that CR is essential but not sufficient for cross- 
neutralization. Nevertheless, given that immune recognition is a 
necessary step in viral neutralization, the modeling of CR among 
highly variable antigenic epitopes constitutes a critical starting point 
toward overcoming genetic diversity for development of an effi- 
cacious HCV vaccine. Here, we made several important observations 
useful for harnessing CR. First, the HCV HVRl sequence space was 
found to be explored in its entirety by different HCV genotypes and 
subtypes, thus indicating a smaller size of the space than can be 
expected from the number of existing HCV strains. Also, it supports 
the use of HVRl sequence variants from a single subtype or genotype 
for vaccine development. Second, in agreement with previous stud- 
ies, we found a broad CR among genetically distant HVRl variants, 
including variants from different HCV genotypes''"'". The skewed 
CR distribution implies that many HVRl variants act as natural 
mimotopes. We showed that antibodies against a few HVRl variants 
were immunoreactive with all tested peptides. Thus, in addition to 
the sequence space limitations, the number of HVRl immunological 
specificities is also lower than can be expected from the number of 
HCV strains. The findings support vaccine development strategies 
that rely on discovery of broadly cross-immunoreactive epitopes'^'". 
Third, predictive models generated in this study showed for the first 
time that the HVRl CR can be predicted based on sequence alone 
and there is a strong association between CR and HVRl structure. 
Such in silico models should significantly facilitate a rational 
exploitation of HVRl CR for vaccine development. Fourth, the gen- 
eral separation between antigenic and immunogenic properties sug- 
gests that selection of antigens for vaccine development should not 
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be solely based on identifying HVRl variants with broad antigenic 
reactivity; rather, immunogenic reactivity should be assessed. Fifth, 
the observed immunological convergence is defined by the strong 
conformational dependence of HVRl epitopes, implying that the 
specificity of HVRl CR may vary depending on the model system 
used for epitope presentation. Therefore, translation of the specific 
CR results from a model, for example based on synthetic peptides 
and mice immunization, to human vaccine development must be 
carefully approached. The HVRl CR should be modeled at condi- 
tions approximating the actual vaccine application. 

Although all observations of CR have been made in this study 
using a model system of synthetic peptides and mouse immuniza- 
tion, antigenic convergence is most probably a common property of 
highly heterogeneous antigenic epitopes. This conclusion is strongly 
supported by the recently reported identification of the specific anti- 
genic reactivity among 10,000 random- sequence peptides to all 
tested antibodies''^'''^. The observations of the statistically significant 
association between CR frequency and sequence space centrality, or 
the stage of HCV infection as described in this paper strongly suggest 
that the model presented here captured a general distribution of 
immunological specificities among HVRl variants. 

It is commonly expected that CR rapidly declines with increasing 
genetic difference between epitopes. This assumption taken in the 
context of the expanding viral sequence space shaped by diversifying 
evolution is generally considered as a basis for immune escape and 
provides the major framework for vaccine development against highly 
heterogeneous viruses. The data obtained in this study indicate that 
this assumption is strongly consistent with the CR observed among 
closely related HVRl variants usually found in each infected patient. 
However, frequent CR among genetically distant HVRl variants sig- 
nificandy reduces the inverse relationship between the genetic distance 
and CR (Fig. lA), indicating that the number of immunological spe- 
cificities distributed across the entire HVRl sequence space is limited. 
This restriction has different bearing on the intra- and inter-host 
evolution. The sequence space available to each HCV variant provides 
ample opportunities for the successful intra-host HVRl evolution, 
facilitating immune escape during chronic infection. However, the 
HVRl convergence significantly limits inter-host HCV evolution. 
This reduction in the genetic and immunological HCV diversity pre- 
sents a different prospect on vaccine development than the concept of 
continues diversifying viral evolution. 

The HCV HVRl convergence observed in this study should be 
viewed as the natural extension of diversifying evolution that occurs 
in a limited sequence space. Owing to strong structural and functional 
constraints"''''''^, the HVRl sequence space is severely restricted in 
size. The expansion under recurring selection pressures can result in 
"exhaustion" of the limited sequence space so generating conditions for 
frequent occurrences of homoplasy''^. The data obtained here suggest 
that extensive convergence is a principal factor affecting the distri- 
bution of immunological properties among HVRl variants. As a result, 
identical immunological specificities are repeatedly being observed 
among genetically distant variants, contributing to widespread epitope 
mimicry in the HVRl sequence space. In contrast to diversification, 
convergence makes HVRl variability tractable. Thus, the homoplastic 
nature of the HVRl diversity turns the extensive genetic variability 
from drawback to opportunity, and offers a novel and more general 
framework for development of a viable hepatitis C vaccine. 

The interaction between virus and host is frequently compared to 
an "arms race""", an idea rooted in diversifying viral evolution driven 
by antagonistic relationships with the host. Although such a process 
might explain viral escape from host immune responses, it is not 
instructive for vaccine development. A game of chess is a more useful 
metaphor describing HCV evolution. The chessboard symbolizes a 
limited intra-host phenotypic space available to HCV, which is com- 
posed of character states (immunological specificities) represented as 
squares. Each square may be occupied by different figures (genetic 



variants), resulting in homoplasy. HCV frequently cheats, making 
several mutation-moves at once, and rapidly traversing across the 
entire space-board, a strategy that results in persistent infection. 
With a very similar "game" played during each infection, conver- 
gence is ingrained in HCV evolution. 

Focus on sequence differences rather than on shared characteris- 
tics among viral variants hinders the discovery of phenotypic con- 
vergence. Convergent patterns of HCV immune responses pave the 
way for new prevention strategies that exploit the homoplastic nature 
of HCV sequence space. The concepts of convergence and tractabil- 
ity of diverse immunological specificities presented here should be 
applicable to other genetically heterogeneous viruses such as HIV 
and influenza virus, so contributing to the rational design of vaccines 
against these infections. 

Methods 

Synthetic peptides. A set of 262 HVRl sequences was selected from Genbank 
sequences as described in Results, with 192 sequences belonging to genotype 1, 26 to 
genotype 2, 35 to genotype 3, 2 to genotype 4, 1 to genotype 5 and 6 to genotype 6. A 
control set of 29 synthetic peptides derived from proteins encoded by unrelated 
hepatitis delta virus and hepatitis G virus. These irrelevant peptides of different sizes 
were used as negative controls accounting for non-specific immunoreactivity of 
mouse serum specimens in all enzyme immunoassay experiments. AH peptides were 
synthesized using standard f-moc chemistry. Quality control of the peptides was 
performed by Matrix-assisted laser desorption/ionization-time-of-flight (MALDI- 
TOF) mass spectrometry and high-performance liquid chromatography (HPLC). All 
peptides used in these experiments were similar in their quality. 

Immunization of mice. For immunization of mice with synthetic peptides, 103 
HVRl peptides were first conjugated with BSA as a carrier protein. Each conjugated 
peptide was diluted with 1 X PBS to a final concentration of 1 mg/ml and mixed with 
an equal volume of the adjuvant TiterMax (CytRx Corp, Norcross, GA). Each group 
of three 4-6 week-old female Balb-C mice was injected intraperitoneally with 100 \A 
of the resulting emulsion (50 fig of peptide) containing a single BSA-conjugated 
HVRl peptide. Two weeks after first injection, a booster was administered with 50 fig 
BSA-conjugated peptide {without adjuvant). Two weeks after booster mice were bled 
and serum specimens were obtained. 

Enzyme immunoassay (EIA). Synthetic peptides representing 261 HVRl variant 
sequences of 31 aa in length obtained from GenBank and 29 unrelated peptides were 
used as antigens in the assay. Wells of 384-well high binding plates were coated with 
0.25 [ig HVRl peptides in a volume of 25 (il/well in PBS overnight at RT. Each well 
was coated with a different peptide and 16 wells were coated with 1.25 [ig BSA as a 
control on immunoreactivity. After incubation, wells were washed five times with 100 
III PBS/0.05% Tween 20 (PBST) and 25iil of serum diluted 1 : 400 in PBST/10% 
normal goat serum (NGS) was added. After incubating the plates at 37^C for 90 min, 
wells were washed again five times with PBST and 25 (il horseradish peroxidase- 
labeled anti-mouse IgG antibody diluted 1 : 50,000 in PBST/10% NGS was added to 
each well. After washing five times with PBST, 25 ill of 1-Step Ultra TMB-ELISA was 
added for color development for 30 min at RT, followed by 25 \A 2M H2SO4 to stop 
the reaction and plates were read at OD450. 

Negative controls. The reproducibility of experimental conditions between tests 
conducted at the same day or at different days is very important for accurate 
mathematical modeling of results. To control for inter-plate variation in each 
experiment, each 384-well plate contained 16 BSA-coated wells, 8 of which were 
tested against a 1 : 30,000 dilution of pool of mice sera immunized with BSA and 8 
were tested against a 1 : 15,000 dilution. The inter-plate variation was low (S.D. 
5.47%). The reactivity between serum and peptide was considered positive if it 
showed a signal/cut-off value > 1 calculated for two types of negative controls. 

(i) Negative controls A; A cut-off was calculated for each peptide, based on its 
reactivity with 35 sera obtained from mice each immunized with different 
irrelevant peptides conjugated to BSA using the protocol described above. 
Cut-off value was calculated as the average of negative controls plus 3.5 times 
of the standard deviation. The average cut-off for all peptides was 0.07547. 

(ii) Negative control B; A cut-off was calculated for each serum, based on its 
reactivity with 29 different unrelated peptides. Cut-off value was calculated 
as the average of negative controls plus 3.5 times of the standard deviation. The 
average cut-off for all sera was 0.08567. 

The reactivity of each immunogen and antigen pair was scored as 1 if one or more 
of the 3 immunized mice showed a positive reaction. Supplementary figure SIA 
shows the number of mice with a positive reaction between each tested immunogen 
and antigen. The percentage of cross-immunoreactive reactions with one or more 
mice is 18.74%, with two or more mice is 7.09% and with three mice is 2.43%. The 
average agreement between mice immunized with the same BSA-conjugated peptide 
was 88.78 (S.D. 14.98). 
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Network construction and analysis. A CR network is a directed graph where each 
vertex is a HVRl sequence and there is a link between two nodes if the reaction 
between the two peptides was positive. Several topological measures of this network 
were calculated using the PAJEK software*^. The CR network is difficult to visualize 
due to its high density of links, therefore, we used the k- shell decomposition method 
to disentangle the hierarchical structure of the network*^. 

Minimal cover set. The smallest set of immunogens that can have positive reactions 
with the whole set of antigens was considered as the minimal cover set. This set is a 
binary integer optimization problem, which was solved using a linear programming 
(LP)-based branch- and-bound algorithm implemented in MATLAB (Math Works, 
Natick, MA). 

Data from acute and chronic samples. For the determination of diameter and CR 
differences between acute and chronic stages of HCV, two groups of samples were 
used: (A) the acute group included 130 different aa sequences from 34 single time- 
point samples^^-™-^'-^^; (B) the chronic group included 799 different aa sequences from 
90 single time-point samples tested in our laboratory^^. 

Sequence space model. The whole dataset contained sequences of two classes: (A) 
multiple clone variants obtained in our laboratory from single time-point samples 
from 176 patients using the end-point limited- dilution (EPLD) real-time PCR^*' and 
obtained from published data on 23 samples^'; and multiple clones obtained from the 
follow-up specimens (2-14 time-points) from 29 patients^"'^^'^^''^; and (B) sequences 
obtained by consensus sequencing from 2967 unrelated patients, which were 
recovered from the Los Alamos HCV Sequence Database^^ during early 2008. 
Recombinant and chimeric sequences as well as patented sequences and sequences 
obtained from non-human hosts were excluded from analysis. 

All sequences can be viewed as points in an informational space known as Sequence 
space. We have studied different ways of portraying this sequence space, taking into 
consideration two important problems: aa convergence and the dimensionality curse, 
(i) aa convergence: different aa could have very similar physicochemical properties or 
fulfill the same function in this particular segment and, therefore, it is desirable to build 
a sequence space that takes into account these differences. For HVRl, we assume that 
some positions are less variable because changes affect more its structure and function 
than changes in highly variable positions. The entropy of each aa position was calcu- 
lated for the whole dataset and these entropy values were used to create weights in the 
calculation of the Hamming distances between every pair of sequences, (ii) 
Dimensionality curse: One of the most important goals in visualizing data is to get a 
sense of how near or far points are from each other. However, as dimensionality 
increases, the distance from a given point to the nearest point approaches the distance 
to the farthest point^^. Distances in sequence space lose discriminatory power very 
rapidly. Accordingly, analysis was focused on the accurate portrayal of local relation- 
ships. We have created a model of sequence space using PFNET^^ which in essence 
prunes a dense network. PFNETs have the ability to derive more accurate local 
structures than other algorithms where the resulting relationships between neighboring 
points are often significantly different from the original data^^. The network generation 
procedure incorporates two parameters: (1) the r parameter defines the metric used for 
computing the distance of paths. (2) the q parameter constrains the number of indirect 
proximities examined in generating the network The network with the minimum 
number of links is obtained when q — n— 1 and r — i.e., PFNET(n— 1,gc). 

Analysis of molecular variance (AMOVA). Each genotype was considered a 
subpopulation of the HVRl sequence space and distances among these subpopulations 
were further explored by AMOVA as implemented in ARLEQUIN^^. The genetic 
structure was analyzed with consideration of the molecular differences between 
sequences in addition to differences in their frequencies. Different types of distances 
were used: The nucleotide Hamming, aa Hamming and path distance (the number of 
steps in the shortest path connecting the pair of peptides in the PFNET). Significance 
levels of the variance components were estimated after 10,000 permutations. 

Multi-response Permutation Procedure (MRPP). MRPP is a non-parametric 
permutation test for testing the hypothesis of no difference between two or more 
groups of entities^*, as implemented in the program BLOSSOM^^. The distribution of 
values under the null hypothesis of no difference between groups was obtained by 
permutating the grouping labels (n— 10,000). 

Ethics Statement, (i) Mice experiments: This study was carried out in strict 
accordance with the recommendations of the Guide for the Care and Use of 
Laboratory Animals of the National Institutes of Health. Protocols for mouse 
immunization experiments have been approved by the Centers for Disease Control 
and Prevention Animal Care and Use Committee (1403KHUMOUC). All efforts 
were made to minimize suffering, (ii) Human subjects: Blood specimens of 4 follow- 
up patients were acquired with written informed consent and all research involving 
human participants were approved by the institutional review board (Centers for 
Disease Control and Prevention, IRB#1428). Description of the NHANES III study 
can be found in^^. Clinical investigations have been conducted according to the 
principles expressed in the Declaration of Helsinki. 
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